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Monte Carlo Motion Planning for Robot 
Trajectory Optimization Under Uncertainty 


Lucas Janson^, Edward Schmerling^, and Marco Pavone 


Abstract This article presents a novel approach, named MCMP (Monte Carlo Mo¬ 
tion Planning), to the problem of motion planning under uncertainty, i.e., to the 
problem of computing a low-cost path that fulfills probabilistic collision avoidance 
constraints. MCMP estimates the collision probability (CP) of a given path by sam¬ 
pling via Monte Carlo the execution of a reference tracking controller (in this paper 
we consider LQG). The key algorithmic contribution of this paper is the design 
of statistical variance-reduction techniques, namely control variates and importance 
sampling, to make such a sampling procedure amenable to real-time implementa¬ 
tion. MCMP applies this CP estimation procedure to motion planning by iteratively 
(i) computing an (approximately) optimal path for the deterministic version of the 
problem (here, using the FMT* algorithm), (ii) computing the CP of this path, and 
(iii) inflating or deflating the obstacles by a common factor depending on whether 
the CP is higher or lower than a target value. The advantages of MCMP are three¬ 
fold; (i) asymptotic correctness of CP estimation, as opposed to most current ap¬ 
proximations, which, as shown in this paper, can be off by large multiples and hinder 
the computation of feasible plans; (ii) speed and parallelizability, and (iii) general¬ 
ity, i.e., the approach is applicable to virtually any planning problem provided that 
a path tracking controller and a notion of distance to obstacles in the configuration 
space are available. Numerical results illustrate the correctness (in terms of feasibil¬ 
ity), efficiency (in terms of path cost), and computational speed of MCMP. 
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1 Introduction 

Robotic motion planning is the problem of computing a path that connects an 
initial and a terminal robot state while avoiding collisions with obstacles and opti¬ 
mizing an objective function Qo). Despite the fact that finding a feasible, let alone 
optimal, solution to a motion planning problem is difficult (even the most basic ver¬ 
sions of the problem are already PSPACE-hard ll22l flOll l. in the past three decades 
key breakthroughs have made the solution to this problem largely practical (see Qol 
and references therein for a comprehensive historical account). Most works in the 
literature, however, focus on a deterministic setup where the state of a robot is per¬ 
fectly known and its actions lead to a deterministic, unique outcome. While this is 
usually an excellent approximation for robots operating in highly structured environ¬ 
ments (e.g., manipulators in an assembly line), it falls short in unstructured settings, 
e.g., for ground or aerial field robots or surgical robotic systems HD. In such cases, 
motion uncertainty, sensing uncertainty, and environment uncertainty may dramat¬ 
ically alter the safety and quality of a path computed via deterministic techniques 
(i.e., neglecting uncertainty). Hence, accounting for uncertainty in the planning pro¬ 
cess is regarded as an essential step for the deployment of robotic systems “outside 
the factory floor” HD- In this paper we introduce Monte Carlo Motion Planning, a 
novel approach to planning under uncertainty that is accurate, fast, and general. 

Related work: Conceptually, to enable a robot to plan its motion under uncer¬ 
tainty, one needs to design a strategy for a decision maker. In this regard, robotic 
motion planning can be formalized as a partially observable Markov decision pro¬ 
cess (POMDP) El, where the key idea is to assume that the state evolves according 
to a controlled Markov chain, the state is only partially observed, and one seeks to 
design a control policy that maps state probability distributions to actions. However, 
despite the theoretical El and practical El successes of the POMDP theory, the 
online computation of a control policy for robotic applications is extremely com¬ 
putationally intensive, and possibly even unnecessary as after a short time horizon 
the environment map may have changed cni. The alternative and widely adopted 
approach is then to restrict the optimization process to open-loop trajectories, which 
involves the much simpler task of computing a control sequence (as opposed to a 
control policy), and recompute the reference trajectory in a receding horizon fashion 
(e.g., every few seconds). This is the approach we consider in this paper. 

To select open-loop trajectories, a large number of works cast the problem into a 
chance-constrained optimization problem HI, where under the assumption of linear 
dynamics and convex obstacles, an open-loop control sequence is computed as the 
solution to a mixed-integer linear program. The works in ll29l fT9l extend this ap¬ 
proach to an optimization over the larger class of affine output feedback controllers, 
comprising a nominal control input and an error feedback term. These works, how¬ 
ever, require an explicit characterization of the obstacle space (in the configuration 
space), which is oftentimes unavailable ifTOl Chapter 5]. This has prompted a num¬ 
ber of researchers to extend the sampling-based motion planning paradigm to the 
problem of planning under uncertainty (in the sampling-based paradigm, an explicit 
construction of the conhguration space is avoided and the conhguration space is 
probabilistically “probed” with a sampling scheme ifTOl l. A common approach is 
to forgo path optimization and recast the problem as an unconstrained planning 
problem where the path collision probability (CP) is minimized. For example, the 
approach of LQG-MP El is to approximate a path CP by combining pointwise CPs 
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as if they were independent, running the rapidly-exploring random trees (RRT) na 
algorithm multiple times, and then selecting the path with minimum (approximate) 
path CR The pointwise CPs are computed within the model that a reference tracking 
controller is employed to track a nominal open-loop path. This is closely related to 
model predictive control (MFC) with closed-loop prediction uni and leads to a less 
conservative collision probability estimate than if the nominal control was executed 
without feedback. A similar approach is used in ll26l . where the authors employ a 
truncation method ||2T1 to improve the accuracy of path CP computation. 

The interplay between minding collision probability while simultaneously opti¬ 
mizing a path planning cost objective function is considered in Ezl, although still 
with an approximation to the path CP. There, cost optimization is considered over 
the set of path plans satisfying a lower bound on success probability. The inclusion 
of path CP as a constraint is also considered in m, where the authors propose 
CC-RRT, an RRT-based algorithm that approximates path CP via Boole’s bound. 
CC-RRT has been extended to include dynamic obstacles via Bayesian nonparamet- 
ric models fT), tailored to the control of unmanned aerial vehicles and parafoils 
ITh), and combined with the RRT* asymptotically optimal version of RRT ifTSl . 

Contributions: In this paper we present an algorithm for robot planning under 
uncertainty that returns high quality solutions (in terms of a general planning objec¬ 
tive) which satisfy a specified constraint on collision probability, or safety tolerance. 
The motivation of this work is that all of the aforementioned approaches approxi¬ 
mate path CP in ways that can be quite inaccurate, thus potentially drastically mis- 
characterizing the feasible domain of path optimization. In particular, we show (see 
Figure that those approximations can be off by many multiples in simple exam¬ 
ples. To address this problem, our hrst contribution is to design a variance-reduced 
(that is, quickly-converging) Monte Carlo (MC) probability estimation algorithm 
for CP computation. This algorithm estimates the collision probability of a given 
trajectory by sampling many realizations of a reference-tracking controller, model¬ 
ing the effort of a robot to follow a reference path. In particular, in this paper, we 
assume a Linear-Quadratic Gaussian (LQG) tracking controller, similar to LQG-MP 
B and MPC with closed-loop prediction ifTTl . Our algorithm does not suffer the in¬ 
accuracies of the approximations mentioned earlier, and indeed provides the exact 
path CP given enough time (in contrast to current approaches). Most importantly, 
our variance-reduction scheme, which combines and tailors control variate and im¬ 
portance sampling techniques in an original fashion to the problem at hand, enables 
the computation of very accurate estimates in a way compatible with real-time op¬ 
erations. This holds even when working with very small CPs, a regime in which 
a straightforward Monte Carlo method would require great computational expense 
to arrive at accurate estimates. Another key advantage of our algorithm is that it 
comes with an estimate of its variance, so that we have a measure of accuracy, un¬ 
like the aforementioned approximations. It is also trivially parallelizable and has the 
potential to be extended to very general controllers and uncertainty models. 

Our estimation algorithm enables a novel approach to planning under uncertainty, 
which we call Monte Carlo Motion Planning (MCMP)—our second contribution. 
MCMP proceeds by performing bisection search over CP and obstacle inflation, at 
each step solving a deterministic version of the problem with inflated obstacles. To 
demonstrate the performance of MCMP, we present simulation results that illustrate 
the correctness (in terms of feasibility), efficiency (in terms of path cost), and com- 
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putational speed of MCMP. From a conceptual standpoint, MCMP can be viewed 
as a planning analogue to MC approaches for robot localization ll^ . 

Organization: This paper is structured as follows. Section|^reviews some back¬ 
ground on MC variance reduction. Section [^formally defines the problem we con¬ 
sider in this paper. Section elucidates the shortcomings of previous path CP ap¬ 
proximation schemes. Sectionj^presents variance-reduction techniques for fast MC 
computation of path CP. Section [^presents the overall MCMP approach. Section]^ 
presents results from numerical experiments supporting our statements. Finally, in 
Section!^ we draw some conclusions and discuss directions for future work. 

2 Background on Monte Carlo Variance Reduction 

The use of Monte Carlo (MC) to estimate the probability of complex events is 
well-studied. In this section we will briefly introduce MC and the two variance 
reduction techniques that provide the basis for our main result in Section For 
more detail and other topics on Monte Carlo, the reader is referred to the excellent 
unpublished text EOl , from which the material of this section is taken. 

2.1 Simple Monte Carlo 

In its most general form, MC is a way of estimating the expectation of a func¬ 
tion of a random variable by drawing many independent and identically distributed 
(i.i.d.) samples of that random variable, and averaging their function values. Explic¬ 
itly, consider a random variable X C K” and a bounded function / : M" —M. For a 
sequence of m i.i.d. realizations of X, the central limit theorem gives, 

Aiv(o,T2), (1) 

as m —> oo, where denotes convergence in distribution, and A^(0, T^) refers to the 

Gaussian distribution with mean 0 and variance T^. This implies ^ YULi f 

E [/(X)] as m oo, where —^ denotes convergence in probability. 

In this paper, X will be a random trajectory controlled to follow a nominal path, 
and / will be the indicator function that a trajectory collides with an obstacle; call 
this collision event A. Therefore, the expectation in Equation Q is just E[/(iL)] = 

P(A). Denote this collision probability by p, and define Psimpie := T 4 L 1 f ■ 

Then can be consistently estimated by the sample variance of the f{X^''>), 

it (/(^^''^) - We)' ^ (2) 

as m —>■ 00 . Vsimpie := jm allows us to quantify the uncertainty in the CP estima¬ 
tor Psimpie by approximating its variance. The material from this subsection can be 
found with more detail in EOl Chapter 2]. 

2.2 Control Variates 

To reduce the variance of Psimpie^ we can use the method of control variates (CV). 
CV requires a function /;: K" —>• M such that 9 E[/i(X)] is known. Then if h{X^‘^) 
is correlated with f{X^'^), its variation around its (known) mean can be used to 
characterize the variation of around its (unknown) mean, which can then 
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be subtracted off from ^simple- Explicitly, given a scaling parameter value j3, we 
estimate p = E[/(X)] by, 

1 m 

Pp ■= - +i30 =p,i.,pie -/3(0 - 0), (3) 

m 

where 9 is the sample average of the h(X^'>). The optimal (variance-minimizing) 
choice of j3 can be estimated from the simulated data as 

m . m 

j3 := £(/(Z«)-p,i^pi,)(/,(Z«)-0)/£(/t(X«)-0)2. (4) 

1=1 ^ /=1 
We then use the CP estimator whose variance can be estimated by, 

1 m 

The data-dependent choice of p introduces a bias in that is asymptotically (in m) 
negligible compared to its variance, so we will ignore it here. As m oo, the vari¬ 
ance reduction due to CV can be characterized by Var(p^)/Var(psimpie) —> 1 — p^, 
where p is the correlation between f{X) and h{X). The material from this subsec¬ 
tion can be found with more detail in ll^ Section 8.9]. 


2.3 Importance Sampling 

When f{X) is the indicator function for a rare event A, as it is in this paper (we 
assume that in most settings, path CP constraints will be small to ensure a high like¬ 
lihood of safety), MC variance reduction is often needed, with importance sampling 
(IS) a particularly useful tool. Since p ^ 1, we can approximate the coefficient of 
variation (ratio of standard deviation to expected value) of the estimator Psi^pie as, 

v/Var(psimpie) _ v/p(l -p)lm ^ 1 1 

Vm y/p' 


(6) 


E [psimple] P 

This means that in order to get the relative uncertainty in Psimpie to be small, one 
needs m ^ 1/p which can be very large, and this is simply due to the rarity of 
observing the event A. IS allows us to sample X from a distribution that makes the 
event A more common and still get an unbiased estimate of p. 

Until now we have considered X to have some fixed probability density func¬ 
tion (pdf) P : M." ^ K>o- Denoting the expectation of /(X) when X has pdf 
P by E/)[/(X)], then for any pdf Q whose support contains that of P (that is, 
P{x) > 0 Q{x) > 0), 

PiX) 


Mnx)]= [ fix)Pix)dX: 

jR" 


' Li[x) 


f{X) 


Q{x)\ 


Therefore, letting be i.i.d. samples with pdf Q, the IS estimate and associ¬ 

ated variance estimate are. 


PQ-= 


1 


f{x‘'‘'’)P{X^‘'’) 




1 ^ ( f{X^''^)P{X^^) 


"S=1 Qix''"’) ~ Q{X^^) J 

If Q can be chosen in such a way that A is common and the likelihood ratio 
P{X)/Q{X) does not have high variance for X G A, then Var(pg) can be much 
smaller (orders of magnitude) than Var(psin,pie) for the same m. The material from 
this subsection can be found with more detail in ll^ Chapter 9]. 


-PQ 


(7) 
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2.4 Comments 

CV and IS may be combined into one estimator, summarized in Algorithm [T] 
the full mathematical details of which are contained in ll^ Section 9.10]. Although 
CV and IS can both be excellent frameworks for variance reduction in MC, there 
is no general method for selecting h or Q, and good choices for either one are ex¬ 
tremely problem-dependent. Indeed, the main contribution of this paper is to find, 
for the important case of linear dynamics and Gaussian noise, h and Q that make 
MC estimation of CPs converge fast enough for real-time planning. 


3 Problem Statement 

We pose the problem of motion planning under uncertainty with safety tolerance 
as a constraint separate from the path cost to be optimized. We consider robots de¬ 
scribed by linear dynamics with control policies derived as LQG controllers tracking 
nominal trajectories. These nominal trajectories are planned assuming continuous 
dynamics, but in order to make the computation of path CPs tractable, we assume 
discretized (zero-order hold) approximate dynamics for the tracking controllers The 
full details of the continuous vs. discrete problem formulations are rather standard 
and due to space limitations are provided in Appendix of the extended version 
of this paper 0. Briefly here, with denoting a multivariate Gaussian with 

mean pL and covariance matrix L, the system dynamics are given by 
x,+ i =Ax,-l-Bu,-fVf, V,y, = Cx,-fw,, w,-M^(0,W). (8) 
where x, S is the state, u, € is the control input, y, is the workspace out¬ 
put, and V, and w, represent Gaussian process and measurement noise, respectively. 
With deviation variables from a nominal trajectory defined as 5x, := x, — x™™, 
5uf := Uf — u™™, and 5yt := y? — y™™, for f = 0,..., T, the discrete LQG controller 
Lt5x,, with L, and 5x, denoting the feedback gain matrix and Kalman 
state estimate respectively, minimizes the tracking cost function 


7;=E 


r-i 


5xjF5xt -f ^ 5xJQ5x, -f 5ufR5ut 

t=i) 


The computation details of Lt and the dynamics of 5x, are standard and given in 
Appendix [A] 0 ; in the remainder of this paper we use only the notation that the 


combined state/estimate deviations evolve as multivariate Gaussians 


5x,; 5x, 


,/L (/ifjZ,) and for suitable definitions of Mt and Nt we may write 

^ = Mtliu-^t+x = M,L,mJ +Nt). 


5x, 


f+i 


(9) 


Let ^ohs be the obstacle space, so that ,^ree := is the free space. Let 

C and jco S <%ree be the goal region and initial state. Given a path cost 
measure c and letting xq, ... jXj- denote the continuous curve traced by the robot’s 
random trajectory (connecting the waypoints xq, ... ,xt) we wish to solve 


Discretized stochastic motion planning (SMP): 

c(x"™(.)) 


mm 

unom(.) 


s.t. P(xo,...,xj.n,£;bs ^0)<a 

u, 

Xfl -- ^{Xa°'^,Po), XT e c^Tgoal 

Equation 
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Note that the optimization is still over continuous-time nominal paths, which we 
discretize when computing the path collision probability P (xq, n ^obs 7 ^ 0 ). 

This formulation is inspired by mil El and represents a compromise between a 
POMDP formulation involving a minimization over the class of output-feedback 
control laws, and an open-loop formulation, in which the state is assumed to evolve 
in an open loop (i.e., no tracking). This can be justihed in two ways. One is that 
the general constrained POMDP formulation is vastly more complex than ours and 
would require much more computation. The other is that, in practice, a motion plan 
is executed in a receding horizon fashion, so that computing output-feedback poli¬ 
cies may not even be useful, since after a short time-horizon the environment map 
may have changed, requiring recomputation anyway. We note that the problem for¬ 
mulation could be readily generalized to a nonlinear setup and to any tracking con¬ 
troller (e.g., LQG with extended Kalman hlter estimation is essentially already in 
the same form)—indeed, one of the key advantages of the MC approach is that it is 
able to handle (at least theoretically) such general versions of the problem. However, 
in the present paper, we limit our attention to the aforementioned LQG setup. 

In the remainder of the paper, we discuss how to quickly and consistently (i.e., 
in a way that is asymptotically exact as the discretization step At ->-0) estimate the 
path CP appearing in equation ( [T0| l, and then we will employ MCMP to generate 
approximate solutions to the discretized SMP problem. 

4 The Problem of Computing Path CP 

In general, the key difficulty for planning under uncertainty (provided a proba¬ 
bilistic uncertainty model is given) is to accurately compute path CP. All previous 
approaches essentially rely on two approaches, namely: 

• Additive approach, e.g., lfT4l : using Boole’s inequality, i.e., P (U,A 

by which a path CP is approximated by summing pointwise CP, at a certain 
number of waypoints along the path, i.e., CP « LiCP,. 

• Multiplicative approach, e.g., a path CP is approximated by multiplying 
the complement of point-wise CP,, specifically CP « 1 — 11, (1 ~ CP,). 

There are three approximations inherent in both approaches: 

(A) The path CP is approximated by combining waypoint CP,’s. That is, no ac¬ 
counting is made for what happens in between waypoints. 

(B) The waypoint CP,’s are combined in an approximate manner. That is, in general 
there is a complex high-dimensional dependence between collisions at different 
waypoints, and these are not accounted for in either approach. In particular, the 
additive approach treats waypoint collisions as mutually exclusive, while the 
multiplicative approach treats them as independent. Since neither mutual exclu¬ 
sivity nor independence hold in general, this constitutes another approximation. 

(C) Each waypoint CP, is approximated (e.g., by using a nearest obstacle). This is 
usually done because integrating a multivariate density over an intersection of 
half-planes (dehning the obstacle set) can be quite computationally expensive. 


A fundamental limitation in both approaches comes from the interplay between 
approximations (A) and (B). Specihcally, while approximation (A) improves with 
higher-resolution waypoint placement along the path, approximation (B) actually 
gets worse, see Figure In Figure 1(a) although Obs 2 comes very close to the 
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(a) (b) 

Fig. 1 Illustration of the interplay between approximations (A) and (B). In (a), there are not enough 
waypoints to properly characterize the pathwise CP, while in (b), the waypoints may be too close 
to not account for their dependence. 

path, it does not come very close to any of the waypoints, and thus the pathwise 
CP will not be properly accounted for by just combining pointwise CPs. In Fig- 
ure |l(b)| the waypoints closest to Obsi will have highly-correlated CPs, which again 
is not accounted for in either the additive or multiplicative approaches. For the lin¬ 
ear Gaussian setting considered here, as the number of waypoints along a hxed path 
goes to infinity, the path CP estimate from the additive approach actually tends to oo, 
while that of the multiplicative approach tends to 1, regardless of the true path CP 
To see this, note that for any fixed path, there exists a positive number e > 0 such 
that CP; is larger than or equal to £ for any point on the path. Therefore, 
k k 

^CP;>^£^-=iA- l>l_pj(l_CP;)>l-(l-£)^^-^l, (11) 

i= 1 i= 1 

where k is the number of waypoints. In other words, both approaches are asymptot¬ 
ically tautological, as they upper-bound a probability with a number greater than or 
equal to one. An important consequence of this is that as the number of waypoints 
approaches inhnity, either approach would deem all possible paths infeasible with 
respect to any fixed non-trivial path CP constraint. This point is emphasized in Fig¬ 
ure]^ which compares true path CP to approximations computed using the additive 
and multiplicative approaches for two different paths, as a function of the number of 
waypoints along the path. Off the plotted area, the additive approach passes through 
an approximate probability of 1 and continues to infinity, while the multiplicative 
approach levels off at 1. Even with few waypoints, both approaches are off by hun¬ 
dreds of percent. The overly conservative nature of the multiplicative approach has 
been recognized in ED, where the authors replace approximate marginal pointwise 
CPs in the multiplicative approach with approximate pointwise CPs conditional on 
the previous waypoint being collision-free. While this is a hrst-order improvement 
on the standard approaches, the conditional pointwise probabilities are quite com¬ 
plex but are approximated by Gaussians for computational reasons, with the result 
that their approximate path CPs can still be off by many multiples of the true value, 
especially for small path CPs (which will usually be the relevant ones). The red 
curve in Figure [2(b)| and [2(d)] shows that the approximation of lISTIl . while a substan¬ 
tial improvement over the alternatives, can still be off by factors of 5 or more, and 
the discrepancy appears to be increasing steadily with the number of waypoints. 

A few comments are in order. First, there is nothing pathological in the example 
in Figure]^ as similar results are obtained with other obstacle conhgurations and in 
higher dimensions. Second, we note that approximations such as these may be very 
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2D Double Integrator: 1% CP 




(C) 


(d) 


Fig. 2 Illustration of path CP approximation schemes for two robotic systems where the true path 
CP is around 1% under the continuous controller. In (a) and (c), the blue curve represents the 
nominal path, the red boxes are the obstacles, and the purple ellipses represent 95% pointwise 
marginal confidence intervals at individual waypoints at discretizations with 102 and 104 points 
respectively. Panels (b) and (d) show the collision probability estimated by each approximation 
scheme as a function of the number of waypoints. Approximation (C) in all approaches is matched 
to their respective papers (additive: diiiiiiiia, multiplicative: |3, conditional multiplicative: 

dUHUISl). 


useful for unconstrained problems that penalize or minimize path CP, since they 
may measure the relative CP between paths well, even if they do not agree with the 
true path CP in absolute terms. However, to address the chance-constrained SMP, 
one needs an accurate (in absolute terms) and fast method to estimate path CP, which 
is one of the key contributions of this paper. Third, the additive approach is guaran¬ 
teed to be conservative with respect to approximation (B). That is, ignoring (A) and 
(C) (the latter of which can also be made conservative), the additive approach will 
produce an overestimate of the path CP. Although this can result in high-cost paths 
or even problem infeasibility, it is at least on the safe side. This guarantee comes 
at a cost of extreme conservativeness, and the two less-conservative multiplicative 
approaches have no such guarantee for any finite number of waypoints. Fourth, the 
limits in equation o apply to any uncertainty model supported on the entire con¬ 
figuration space, and even to bounded uncertainty models so long as the path in 
question has a positive-length segment of positive pointwise CP. 

In the next section, we will present a MC approach that addresses all three ap¬ 
proximations (A)-(C) stated earlier. Specifically, for (A), although collisions can 
truly be checked along a continuous path only for special cases of obstacles, Monte 
Carlo simply checks for collisions along a sampled path and thus can do so at ar¬ 
bitrary resolution, regardless of the resolution of the actual waypoints, so approxi- 
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mation (A) for MC has no dependence on waypoint resolution. For (B), the high¬ 
dimensional joint distribution of collisions at waypoints along the path is automat¬ 
ically accounted for when sampling entire realizations of the tracking controller. 
And for (C), since MC only has to check for collisions at specific points in space, 
no multivariate density integration needs to be done. 

5 Variance-Reduced Monte Carlo for Computing Pathwise CP 

5.1 Control Variates 

As discussed in Section a good control variate h for / should have a known 
(or rapidly computable) expected value, and should be highly correlated with /. 
As mentioned in the previous section, existing probability-approximation methods, 
while not accurate in absolute terms, can act as very good proxies for CP in that 
they resemble a monotone function of the CP. Coupled with the fact that such ap¬ 
proximations are extremely fast to compute, they make ideal candidates for h. 

Since even individual waypoint CPs are expensive to compute exactly for all 
but the simplest obstacle sets, we approximate the obstacle set locally as a union 
of half-planes, similar to ED- For each waypoint along the nominal path, we 

compute the closest obstacle points zf ^ and their corresponding obstacle half-planes 
such that none of these points are occluded by each others’ half-planes. “Close” is 
measured in terms of the Mahalanobis distance defined by the covariance matrix of 
the robot state at that waypoint, and the obstacle half-planes are defined as tangent 
to the multivariate Gaussian density contour at each close point. Mathematically this 
corresponds to at most one point per convex obstacle region (with S^obs = 

U" 1 i.e., 

zf^= argmin (a-x™”)^r-'(a-x™'"). 
aG 

We then approximate the pointwise probability of collision by the probability of 
crossing any one of these half-planes; this probability is approximated in turn by 
Boole’s bound so that an expectation is simple to compute. That is, we define hfi{X) 

to be the indicator that x™™ crosses the zf^ obstacle half-plane, and define h{X) = 
We note that considering multiple close obstacle points, as opposed to 
only the closest one, is important when planning in tight spaces with obstacles on 
all sides. Correlations between h and / in testing were regularly around 0.8. 

5.2 Importance Sampling 

From a statistical standpoint, the goal in selecting the importance distribution Q 
is to make the pathwise CP sampled under Q on the order of 1, while keeping the 
colliding paths sampled from Q as likely as possible under the nominal distribution 
P. From a computational standpoint, we want Q to be fast to sample from and for the 
likelihood ratio P/Q to be easy to compute. Our method for importance sampling 
constructs 2 as a mixture of sampling distributions Qq, - ^Qk —one for each close 
obstacle point along the nominal trajectory. The intent of distribution Qti is to 
sample a path that is likely to collide with the obstacle set at waypoint t. We accom¬ 
plish this by shifting the means of the noise distributions W, 0 < i < f, leading up 
to time t so that Eg,J5x,] = — x™™. To minimize likelihood ratio P{X)/Qti{X) 

variance, we distribute the shift in the most likely manner according to Mahalanobis 
distance. This amounts, through Equation (|^, to solving the least-squares problem 
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( 12 ) 


min 

s.t. [/ 0] [ ^ M,_r) AlJ-s = -x“ 

s=0 \ r=0 / 

and sampling the noise as ^ {AiJ,^,Ns) for 0 < s <t. 

We weight the full mixture IS distribution, with 9 = E[/t(X)], as 

That is, the more likely it is for the true path distribution to collide at f, the more 
likely we are to sample a path pushed toward collision at f. 

5.3 Combining the Two Variance-Reduction Techniques 

Due to space limitations, we do not discuss the full details of combining CV and 
IS here, but simply state the final combined procedure in Algorithm[T] 

Algorithm 1 Monte Carlo Path CP Estimation 


Require: Nominal distribution P, control variate h as in Section 5.1 9 := Ep[/r(A)], importance 
distribution Q as in SectionjS^ number of samples m 
1: Sample i.i.d. from Q 

2: Denoting the likelihood ratio ) := P(X^‘^ )/ Q(X^‘^), compute 


PQ = - 9g = A 




ir=i 

1^ -rv(')\r^V-(0\ 


= -j:f(r‘>)L(X^‘’}-Peh(X^‘’)L(X^‘>) + l3^9 
^ /=1 

= i I - Ps.p, - P -o)y 

3: return 


6 MCMP Algorithm 

With an algorithm for path CP estimation in hand, we now incorporate it into a 
simple scheme for generating high-quality paths subject to a path CP constraint. Al- 
gorithm|^describes the Monte Carlo Motion Planning (MCMP) algorithm in pseu¬ 
docode. 

The idea of MCMP is simple: solve the deterministic motion planning problem 
with inflated obstacles to make the resulting path safer, and then adjust the inflation 
so that the path is exactly as safe as desired. Note that in line [3 of Algorithm 
MC could be replaced by any of the approximations from Section ^ but the output 
would suffer in quality. In the case of the multiplicative approaches, the CP may 
be underestimated, in which case the safety constraint will be violated. More com¬ 
monly (for a reasonable number of waypoints), for both additive and multiplicative 
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Algorithm 2 Monte Carlo Motion Planning 

Require: Maximum inflation /max (e g- configuration space diameter), minimum inflation /min 
(e.g. 0), number of bisection steps r, path CP constraint a 
1: for / = 1 : r do 

2: Compute an (approximately) optimal path d using, e.g., an asymptotically optimal 

sampling-based motion planning (SBMP) algorithm, for the deterministic version of the 
problem with the obstacles inflated by (/min +/max)/2 
3: Compute a MC estimate p of the CP of a (set p = 0 if the previous step fails to find a 

feasible solution) 

4: if p > a then 

5: /min ~ {4nin “t“ fmax)/2 

6: else 

2: /max — (fmin “b 2max)/2 

8: end if 

9: end for 
10: return a 


approaches, the CP may be substantially overestimated. Although the resulting path 
will not violate the safety constraint, it will be inefficient in that it will take a costlier 
path than needed (or than the one returned by using MC CP estimation) in order to 
give the obstacles a wider berth than necessary. Another possibility is that the ob¬ 
stacle inflation needed to satisfy the conservative safety constraint actually closes 
off all paths to the goal, rendering the problem infeasible, even if it may have been 
feasible using MC estimation. 

It is worth pointing out that the tails, or probability of extreme values, of the 
Gaussian distribution fall off very rapidly, at a double-exponential rate. For instance, 
the O.OF* percentile of a Gaussian distribution is only about 20% farther from the 
mean than the O.F^’ percentile. In the Gaussian framework of this paper, this means 
that a path that already has a small CP can make its CP much smaller by only shift¬ 
ing slightly farther from the obstacles. Thus although the additive or multiplicative 
approximations may overestimate the pathwise CP by hundreds of percent, the cost 
difference between using them in line of Algorithm and using MC in line of 
Algorithm 1^ may not be nearly so drastic. However, it can be if the increased ob¬ 
stacle inflation required closes off an entire homotopy class, or renders the problem 
infeasible altogether. 

7 Numerical Experiments 

We implemented variance-reduced path CP estimation and MCMP in Julia Q for 
numerical experiments on a range of linear dynamical systems and obstacle sets, run 
using a Unix operating system with a 2.0 GHz processor and 8 GB of RAM. Many 
implementation details and tuning parameters have been omitted in the discussion 
below; the code for these results may be accessed at https : //github. com/ 
schmrlng/MCMP-ISRR15 

Figures]^ and [^display some example results for single integrator {x = u) and 
double integrator (x = u) systems in a two-dimensional workspace, and Table 
summarizes a range of statistics on algorithm performance in three-dimensional 
workspaces as well. The deterministic planning step (Algorithm]^ line[^ was ac¬ 
complished using the differential FMT* algorithm Il23l on a fixed set of nodes. By 
caching nearest-neighbor and distance data for these nodes (Offline Planning), the 
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(a) 


2D Single Integrator: 1% CP 



(b) 


2D Single Integrator: 5% CP 



(c) 


Fig. 3 Illustration of the MCMP algorithm output given a range of target path CPs for a 2D sin¬ 
gle integrator system. For these uncertainty parameters, we see that the precise safety tolerance 
value (between 1 %-20%) will correspond to a nominal solution in one of three distinct homotopy 
classes. The orange obstacle in (b) is added by the “block and backtrack” modification, discussed 
in Section^ to the basic MCMP bisection Algorithm]^ The black and green lines denote the close 
obstacle points and vectors defining their half planes respectively; only the pruned set is depicted. 


total replanning time over all inflation factors (Online Planning, essentially consist¬ 
ing only of collision checking) was significantly reduced. For the single integrator 
systems 2D SI and 3D SI, the planning problem is equivalent to geometric planning, 
which allowed us to apply the ADAPTIVE-SHORTCUT rubber-band-style heuris¬ 
tic for smoothing planned paths 0. Applying this smoothing heuristic ensures that 
the path CP varies continuously with inflation factor within a path homotopy class. 
Between homotopy classes the CP may be discontinuous as a function of infla¬ 
tion factor. If increasing the inflation factor increases the CP discontinuously, the 
bisection process is not affected; otherwise if the CP decreases (e.g. Figure (b) 
and (c)—the CPs are 0.3% and 1.5% respectively around the inflation factor which 
closes off the (c) route) the MCMP bisection algorithm may get stuck before reach¬ 
ing the target CP a (e.g. 1% in the case of Table[2row 2). To remedy this issue, we 
implemented a “block and backtrack” modification to Algorithm|^which blocks off 
the riskier homotopy class with an obstacle placed at its waypoint most likely to be 
in collision, and then resets the bisection lower bound for the inflation factor. This 
results in increased computation time, but returns a path with the goal CP in the end. 

We did not implement any smoothing procedure for the double integrator sys¬ 
tems. Each nominal trajectory is selected as a concatenation of local steering con¬ 
nections, subject to variance in the placement of the finite set of planning nodes. In 
practice, this means that path CP is piecewise constant, with many small disconti¬ 
nuities, as a function of inflation factor. If the bisection procedure terminates at an 
interval around the desired CP, we choose the path satisfying the safety constraint: 
this explains the mean True CP below the goal value in Table [T]rows 4 and 5. 

In order to speed up the Monte Carlo CP estimation, we prune the set of close 
obstacle points so that the ones that remain are expected to have their term in the 
mixture distribution Q sampled at least once. We note that this style of pruning does 
not bias the results; it only affects computation time and estimator variance. Addi¬ 
tionally, since during each MCMP run we only use the CP estimate for bisection. 
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Goal 
CP (%) 

Offline 
Planning (s) 

Online 
Planning (s) 

MC 

Time (s) 

Discretization 

Points 

Bisection 

Iterations 

MC 

Particles 

2D SI (A) 

I 

0.25 ± 0.03 

1.25 ± 0.24 

2.64 ± 0.83 

102.5 ± 0.8 

6.3 ± 1.5 

2085 ± 686 

2D SI (B) 

I 

0.27 ± 0.04 

2.48 ± 0.77 

4.65 ± 1.70 

116.5 ±0.7 

13.3 ±4.4 

2955 ± 1052 

3D SI 

I 

0.35 ± 0.03 

1.95 ± 0.76 

3.00 ± 0.89 

83.6 ± 1.4 

6.3 ± 3.1 

1667 ± 764 

2D DI 

I 

6.64 ±0.14 

2.86 ± 0.98 

5.82 ± 2.33 

107.7 ± 6.0 

8.6 ± 2.9 

2383 ± 952 

3D DI 

I 

20.90 ± 1.11 

6.27 ± 2.40 

7.45 ± 3.75 

71.7 ± 10.8 

7.8 ± 3.3 

2117 ±938 



Goal 
CP (%) 

Nominal 
Path Cost 

True (MC) 
CP (%) 

Additive 
Estimate (%) 

Multiplicative 
Estimate (%) 

Cond. Mult. 
Estimate (%) 

2D SI (A) 

1 

1.47 ± 0.00 

1.01 ±0.06 

22.67 ± 2.39 

20.35 ± 1.92 

2.04 ± 0.20 

2D SI (B) 

1 

1.69 ±0.01 

1.00 ±0.06 

12.88 ± 3.72 

12.10 ±2.97 

1.37 ±0.26 

3D SI 

1 

1.28 ±0.03 

1.00 ±0.06 

47.48 ± 7.98 

38.84 ±5.51 

2.15 ±0.23 

2DDI 

1 

7.20 ± 0.43 

0.67 ± 0.27 

15.04 ± 8.97 

13.78 ±7.59 

1.39 ± 0.68 

3D DI 

1 

9.97 ± 1.61 

0.66 ± 0.33 

12.26 ±5.88 

11.68 ±5.43 

0.59 ± 0.32 


Table 1 (MCMP in various state spaces). Results averaged over 400 MCMP runs: 20 runs each 
for 20 SBMP sample sets. SI and DI denote single and double integrator respectively. We aim to 
minimize arc-length for the SI systems, and a mixed time/control energy cost for the DI systems. 
2D SI (A) refers to the obstacle set in Figure]^ and 2D SI (B) refers to the obstacle set in Figure]^ 


we also save time by terminating the estimation procedure early when estimated 
estimator variance suggests we may do so with confidence. 

From Table we see that MCMP run times approach real time in a range of state 
spaces from 2-6 dimensions, on the order of 5-10 seconds total, excluding plan¬ 
ning computation that may be cached offline. This is accomplished even at a level 
of tracking discretization sufficient to approximate continuous LQG. Planning time 
and probability estimation time are similar in magnitude, indicating that the MC 
portion of MCMP is not significantly holding back algorithm run time compared to 
a faster approximation scheme, even in this single processor implementation. Com¬ 
puting the Monte Carlo path simulations (MC Particles) in parallel could greatly 
reduce that time. We note that the few thousand simulations required in total by 
MCMP would not be enough to certify, using simple Monte Carlo, that a path CP 
is within the interval (0.9%, 1.1%) even once, which highlights the effectiveness of 
our proposed estimator variance reduction techniques. 

As can be seen from the simulations, the accuracies of the additive, multiplica¬ 
tive, and conditional multiplicative approximations vary over problems and param¬ 
eters, even occasionally being quite accurate. At this level of discretization, we see 
that the conditional multiplicative approximation scheme is within a factor of 2 of 
the true CP value, but may either underestimate or overestimate depending on which 
of approximation (A) or (B) from Section|^has the stronger effect. This sheds light 
on a key difference between using MC to estimate path CP as opposed to its alter¬ 
natives: MC not only gives accurate estimates, but also comes with a standard error 
of that estimate, effectively allowing the user to know whether or not the estimate 
is a good one. On the other hand, there is no certification for the various other ap¬ 
proximations; they simply return point values with no information about how far 
from the truth they are. This difference is especially crucial given the overarching 
goal of this exercise, which is to come up with paths that are guaranteed to have 
a high probability of success and have low cost. The standard error estimates that 
come from MC can be used as a kind of certificate of accuracy that gives the user 
confidence in its value, while alternatives come with no such certificate. 
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8 Conclusion 

We have presented a computationally fast method for provably-accurate pathwise 
collision probability estimation using variance-reduced Monte Carlo. The variance- 
reduction techniques employ a novel planning-specific control variate and impor¬ 
tance distribution. This probability-estimation technique can be used as a compo¬ 
nent in a simple meta-algorithm for chance-constrained motion planning, gener¬ 
ating low-cost paths that are not conservative with respect to a nominal path CP 
constraint. Simulation results confirm our theory, and demonstrate that computation 
can be done at speeds amenable to real-time planning. 

This works leaves many avenues for further investigation, the foremost of which 
is parallelization. As noted earlier, a key feature of MC is that it is trivially paral- 
lelizable (which is not changed by CV or IS). As most of the computation time is 
spent computing likelihood ratios, which is mostly linear algebra, our technique is 
ideally suited for implementation on a GPU. Another future research direction is 
to extend this work to more general controllers and uncertainty models. Heavier- 
tailed distributions, compared to the Gaussian model addressed here, would require 
larger shifts in inflation factor to affect similar changes in path CP, making a non¬ 
conservative CP estimation procedure all the more important. Monte Carlo itself is 
extremely flexible to these parameters, but it remains to be seen if appropriate con¬ 
trol variates or importance distributions can be developed to speed it up. We note 
that the meta-algorithm mentioned in this paper is extremely simple, and can surely 
be improved upon, although there was not space in this paper to investigate all such 
possibilities. One potential improvement is to incorporate domain knowledge to dif¬ 
ferentially inflate the constraints, or to do so in an iterative or adaptive way, similar 
in spirit to El. Another improvement could be to make bisection search adaptive 
and to incorporate the uncertainty in the probability estimates. We also reiterate 
that the meta-algorithm can be used with any deterministic planning algorithm, and 
thus it is worth exploring which particular algorithms are best for different plan¬ 
ning problems and cost functions. Finally, although we use our MC method to solve 
the chance-constrained motion planning problem, it is in no way tied to that prob¬ 
lem, and we plan to test our method on other problems, such as minimizing CP or 
optimizing a objective function that penalizes CP. 
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A Problem Formulation 

As briefly discussed in Section]^ we consider the evolution of the robot’s path 
as a discrete approximation of an LQG controller tracking a continuous nominal 
trajectory. Here we provide the complete characterization of the problem setup. 

A.l Continuous-time formulation 

We assume the robot’s dynamics evolve according to the stochastic linear model: 


x(f) =Acx(f)+Bcu(f)+v(f), y(f) =Qx(f)+w(f), (13) 

where x(f) G is the state, u(f) G is the control input, y(f) is the observed out¬ 
put, and V ~ ^(OjVe) and w ~ ^{0,Wc) represent Gaussian process and measure¬ 
ment noise, respectively. Let ^obs be the obstacle space, so that :=KA^obs 
is the free space. Let C ^ree jcq G be the goal region and initial 

state, respectively. 

Let x"°"'(f) be a nominal solution, i.e., a solution to the deterministic version of 
the system’s equations: 

= A,x“”(f) y™”(f) = Qx"°”(f), 

where u"”™)?) is the nominal control input, y"°™(f) is the nominal measured output, 
Xq°™ is the (deterministic) initial state, x’’°™(7’) G ^goai, and T is the final time. 
Consider the deviation variables 5x(f) := x(t) — x"°™(f), 5u(f) := u(f) — ), 

and 5y{t) := y(f) — y'’™(f). The dynamics of the deviation variables are readily 
obtained as 

5x(f) =Acdx(t) +BcSu(t)+v(t), dy(t) = CcSx(t)+w(t), 

where 5x(0) is the initial condition. Consider the quadratic cost functional 


J = E 


5x(TfF5x(T)+J 5x{tf Q5x{t)+5u{tf Rdu(t) 


The functional J is minimized by the Linear Quadratic Gaussian (LQG) controller 
Chapter 9] = L{t) 5x(f), where L{t) is the linear quadratic regulator 


(LQR) state feedback gain and 5x(f) is the Kalman filter estimate of 5x(f) (we refer 
the reader to ll25l Chapter 9] for further details on the continuous formulation and 
provide below detailed equations for the discrete version of the problem). 

We are now in a position to state the problem we wish to solve. In words, we 
seek to compute a nominal path of minimum cost subject to the constraint that the 
stochastic dynamics driven by a reference-tracking LQG controller are collision- 
free with high probability. More rigorously, for a given nonnegative cost function c 
(e.g., length) acting on paths, the stochastic motion planning problem is defined as 


Stochastic motion planning (SMP): 


s.t. P(3f: x(f) 6 S^obs) < a 
u(t)=u"°”(0-f5u‘^‘2°(t) 

X(0) ~ >'(x|;°”,Po), X(r) e JTgoal 
Equation 


(14) 
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A.2 Discrete Approximation 

To solve the SMP problem, one is required to compute path collision proba¬ 
bilities P(3f; x(f) G ^obs)- To make the problem tractable, we consider a discrete 
formulation whereby the dynamics are given by 
x,+i =Ax,+Bu,+v,, V, -^(0,y), yf = Cx,+w,, w, ^ A^(0,W), (15) 
for a given fixed timestep Zif. In equation (fTSjl, we used the definitions 


1 

iy:= — 
At 


C:=Cc, V := 

The deviation variables are then defined as 5x, := x, — x™™, := u, — u"°™, and 

5yr ■=yt — y™™, for t — The cost function for the discrete LQG controller 

is given by 

r-i 

5xjF5xt + ^ 5xf Q5xt + 5ufR5ut 


J = E 


t=o 


and the LQG controller minimizing J is given in ll24l by = Li dxt, where the 

feedback gain is 

L, = -{R + B'^St+iB)-^B'^Sr+iA, 

St = e+A^ {St+i-St+iBiR+B^S,+iB)-^B'^St+i)A, St = F, 
and the Kalman estimate dynamics are 

5x,+i =A5xt+BL,5x,+Kt{5y — C5xr), 

K, =AP,C'^{W + CP,Cy\ P,+i ^V+A{P,-P,C^{W + CP,Cy^CPt)A'^, 
with Pq denoting the covariance of the initial state xq (and hence of 5xo). The com¬ 
bined system evolves according to 




_5X;+1 



5xt 

5xt 


+ 


V, 

K,-Wt 


= M, 


5x, 


5x, 


( 16 ) 


A BL, 

K,C A+BL,-KtC 

V 0 

0 k,wkI 

formula for simulating state trajectories, also represents a means for tracking the 

r5x,1 


where n, .yV [Q^Nt = 


. Equation ([Tfill, in addition to providing 


state uncertainty at each time step. Given that 

'5x,+i 


5xt 


JY we have that 


dx. 


i+y 


yV{ix,+i =M,pt,E,+ i =M,E,mJ +Nt). 


(17) 


Using this recursion we may compute the marginal distributions of each waypoint 

[■PoO] 


along the full LQG-controlled trajectory, starting from /Xq = 0 and Zq = q q ■ 

ting xo,... ,xt denote the continuous curve traced by the robot’s random trajectory 
(which connects the points xq, ... ,xt), the problem we wish to solve then becomes: 

Discretized SMP: 

min c(x‘’°”(-)) 

unoni(.) 


s.t. P(xo,...,xj.n,£;bs 7 ^ 0) < a 

u, 

xo -- yY{xy^,Po), XT e c^Tgoal 

Equation 


(18) 
























